AITopics | separate network

Collaborating Authors

separate network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters

Neural Information Processing SystemsDec-24-2025, 11:19:46 GMT

Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL). We begin by identifying a critical flaw in a popular algorithmic choice used by many ensemble-based RL algorithms, namely the use of shared pessimistic target values when computing each ensemble member's Bellman error. Through theoretical analyses and construction of examples in toy MDPs, we demonstrate that shared pessimistic targets can paradoxically lead to value estimates that are effectively optimistic. Given this result, we propose MSG, a practical offline RL algorithm that trains an ensemble of $Q$-functions with independently computed targets based on completely separate networks, and optimizes a policy with respect to the lower confidence bound of predicted action values. Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that on challenging domains such as antmazes, MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin. Additionally, through ablations on benchmarks domains, we verify the critical significance of using independently trained $Q$-functions, and study the role of ensemble size. Finally, as using separate networks per ensemble member can become computationally costly with larger neural network architectures, we investigate whether efficient ensemble approximations developed for supervised learning can be similarly effective, and demonstrate that they do not match the performance and robustness of MSG with separate networks, highlighting the need for new efforts into efficient uncertainty estimation directed at RL.

name change, pessimistic, uncertainty, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Review for NeurIPS paper: Deep Variational Instance Segmentation

Neural Information Processing SystemsJan-23-2025, 06:45:54 GMT

Weaknesses: The authors often call their method a "one step approach" and criticism other methods for using "heuristic postprocessing". I don't think the authors should be making these comments as there is a separate network to classify the masks. And I don't really buy the justification at the end of Page 6 that the proposed method needs to verify less masks than a typical two-stage method (ie Mask-RCNN) and is thus a "one-stage" method. Rather, I think the authors could be highlighting more the fact that their method does not require any anchors or region proposals, as I think this is a strong argument to be making. I also think the paper could benefit from an analysis of the number of instances that the network can predict. Ie, if the network was trained with a maximum of K instances in the training set, can it correctly predict more than K instances at test time?

background class, neurips paper, segmentation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters

Neural Information Processing SystemsOct-11-2024, 15:22:06 GMT

Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of Q -functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL). We begin by identifying a critical flaw in a popular algorithmic choice used by many ensemble-based RL algorithms, namely the use of shared pessimistic target values when computing each ensemble member's Bellman error. Through theoretical analyses and construction of examples in toy MDPs, we demonstrate that shared pessimistic targets can paradoxically lead to value estimates that are effectively optimistic. Given this result, we propose MSG, a practical offline RL algorithm that trains an ensemble of Q -functions with independently computed targets based on completely separate networks, and optimizes a policy with respect to the lower confidence bound of predicted action values. Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that on challenging domains such as antmazes, MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin.

independence matter, pessimistic, separate network, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Decoupling Value and Policy for Generalization in Reinforcement Learning

Raileanu, Roberta, Fergus, Rob

arXiv.org Artificial IntelligenceFeb-20-2021

Standard deep reinforcement learning algorithms use a shared representation for the policy and value function. However, we argue that more information is needed to accurately estimate the value function than to learn the optimal policy. Consequently, the use of a shared representation for the policy and value function can lead to overfitting. To alleviate this problem, we propose two approaches which are combined to create IDAAC: Invariant Decoupled Advantage Actor-Critic. First, IDAAC decouples the optimization of the policy and value function, using separate networks to model them. Second, it introduces an auxiliary loss which encourages the representation to be invariant to task-irrelevant properties of the environment. IDAAC shows good generalization to unseen environments, achieving a new state-of-the-art on the Procgen benchmark and outperforming popular methods on DeepMind Control tasks with distractors. Moreover, IDAAC learns representations, value predictions, and policies that are more robust to aesthetic changes in the observations that do not change the underlying state of the environment.

decoupling value and policy, representation, value function, (13 more...)

arXiv.org Artificial Intelligence

2102.1033

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Nachum, Ofir, Tang, Haoran, Lu, Xingyu, Gu, Shixiang, Lee, Honglak, Levine, Sergey

arXiv.org Artificial IntelligenceSep-23-2019

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

agent, exploration, hierarchy, (14 more...)

arXiv.org Artificial Intelligence

1909.10618

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Objectives for Treatment Effect Estimation

Nie, Xinkun, Wager, Stefan

arXiv.org Machine LearningDec-13-2017

We develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities to form an objective function that isolates the heterogeneous treatment effects, and then optimize the learned objective. This approach has several advantages over existing methods. From a practical perspective, our method is very flexible and easy to use: In both steps, we can use any method of our choice, e.g., penalized regression, a deep net, or boosting; moreover, these methods can be fine-tuned by cross-validating on the learned objective. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property, whereby even if our pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same regret bounds as an oracle who has a-priori knowledge of these nuisance components. We implement variants of our method based on both penalized regression and convolutional neural networks, and find promising performance relative to existing baselines.

artificial intelligence, heterogeneous treatment effect estimation, machine learning, (12 more...)

arXiv.org Machine Learning

1712.04912

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Robots Podcast #233: Geometric Methods in Computer Vision, with Kostas Daniilidis

RobohubApr-30-2017, 03:10:36 GMT

In this episode, Jack Rasiel speaks with Kostas Daniilidis, Professor of Computer and Information at the University of Pennsylvania, about new developments in computer vision and robotics. Daniilidis' research team is pioneering new approaches to understanding the 3D structure of the world from simple and ubiquitous 2D images. They are also investigating how these techniques can be used to improve robots' ability to understand and manipulate objects in their environment. Daniilidis puts this in the context of current trends in robot learning and perception, and speculates how it will help bring more robots from the lab to the "real world". How does bleeding edge research become a viable product? Daniilidis speaks to this from personal experience, as an advisor to startups spun out from the GRASP Lab and Penn's Pennovation incubator. Kostas Daniilidis is the Ruth Yalom Stone Professor of Computer and Information Science at the University of Pennsylvania where he has been faculty since 1998.

artificial intelligence, daniilidis, kosta daniilidis, (17 more...)

Robohub

Country:

North America > United States > Pennsylvania (0.45)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Industry: Education > Educational Setting (0.68)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.50)

Add feedback